On the efficient execution of bounded Jaro-Winkler distances

نویسندگان

  • Kevin Dreßler
  • Axel-Cyrille Ngonga Ngomo
چکیده

Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded JaroWinkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 10 strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time-efficient execution of bounded Jaro-Winkler distances

Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. A considerable portion of the information contained available as RDF on the Web pertains to persons. Thus, efficient and effective measures for comparing names are central to facilitate the integration of informati...

متن کامل

On Flexible Web Services Composition Networks

The semantic Web service community develops efforts to bring semantics to Web service descriptions and allow automatic discovery and composition. However, there is no widespread adoption of such descriptions yet, because semantically defining Web services is highly complicated and costly. As a result, production Web services still rely on syntactic descriptions, key-word based discovery and pre...

متن کامل

New robust and secure alphabet pairing Text Steganography Algorithm

Steganography has been practiced since ancient times. Many Linguistic Steganography (popularly known as Text based Steganography) algorithms have been proposed like Word Spacing, Substitution, Adjectives, Text Rotation, Mixed Case Font etc.. Information Hiding effectively means that the method/technique should be Robust, Secure and have good Embedding capacity. Measure of Similarity between cov...

متن کامل

Evaluating String Comparator Performance for Record Linkage

We compare variations of string comparators based on the Jaro-Winkler comparator and edit distance comparator. We apply the comparators to Census data to see which are better classifiers for matches and nonmatches, first by comparing their classification abilities using a ROC curve based analysis, then by considering a direct comparison between two candidate comparators in record linkage results.

متن کامل

Name Phylogeny: A Generative Model of String Variation

Many linguistic and textual processes involve transduction of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is to organize the collection. Our generative model explains similarities among the strings by supposing that some strings in the collection were not generated ab initio, but were inst...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Semantic Web

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2017